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Description 

[0001] The present invention relates to a novel DNA coding for sorbitol dehydrogenase of a microorganism belonging 
to acetic acid bacteria including the genus Gluconobacter and the genus Acetobacter, an expression vector containing 

5 the said DNA, and recombinant organisms containing the said expression vector. Furthermore, the present invention 
relates to a process for producing recombinant D-sorbitol dehydrogenase and a process for producing L-sorbose by uti- 
lizing the said recombinant enzymes or recombinant organisms containing the said expression vector. 
[0002] L-Sorbose is an important intermediate in the actual industrial process of vitamin C production, which is mainly 
practiced by Reichstein method (Helvetica chimica Acta, 17: 31 1 , 1934). In the process, the only microbial conversion 

to is the L-sorbose production from D-sorbitol by Gluconobacter or Acetobacter strains. The conversion is considered to 
be carried out by NAD/NAD P independent D-sorbitol dehydrogenase (SLDH). L-Sorbose is also a well known substrate 
in the art for microbiologically producing 2-keto-L-gulonic acid, which is a useful intermediate in the production of vita- 
min C. 

[0003] It is known that there are NAD/NAD P-independent D-sorbitol dehydrogenase which catalyzes the oxidation of 

is D-sorbitol to L-sorbose, The said D-sorbitol dehydrogenase was isolated and characterized from Gluconobacter 
suboxydans var a IFO 3254 (E. Shinagawa et al.. Agric Biol. Chem., 46: 135-141 , 1982), and found to consist of three 
subunits with the molecular weight of 63 kDa, 51 kDa, and 17 kDa; the largest subunit is dehydrogenase containing 
FAD as a cofactor, the second one is cytochrome c AB/vs / 26.6.98 and the smallest one is a protein with unknown func- 
tion; and shows its optimal pH at 4.5. Such SLDH was also purified and characterized from G. suboxydans ATCC 621 

20 (KCTC 21 1 1) (E-S Choi et al.. FEMS Microbiol. Lett. 125:45-50. 1995) and found to consist of three subunits with the 
molecular weight of 75 kDa, 50 kDa, and 14 kDa: the large subunit is dehydrogenase containing pyrroloquinoline qui- 
none (PQQ) as a cofactor, the second one is cytochrome c and the small one is a protein with unknown function. The 
inventors also purified and characterized the NAD/NADP-independent SLDH from G. suboxydans IFO 3255 (T. 
Hoshino et al.. EP 728840); the SLDH consists of one Kind of subunit with the molecular weight of 79.0 +/- 0.5 KDa and 

25 shows its optimal pH at 6 to 7 and shows dehydrogenase activity on mannitol and glycerol as well as on D-sorbitol. 
[0004] Although several SLDHs have been purified, their genes have not been cloned yet. It is useful to clone the 
SLDH gene for efficient production of the SLDH enzyme and for constructing recombinant organism having enhanced 
SLDH activity to improve the production yield of L-sorbose. It is also useful to introduce the said SLDH gene into desired 
organisms, for example, Gluconobacter converting L-sorbose to 2-keto-L-gulonic acid for constructing recombinant 

30 microorganisms which directly produce 2-keto-L-gulonic acid from D-sorbitol. 

[0005] The present invention provides a nucleotide sequence (gene) coding for D-sorbitol dehydrogenase (SLDH) 
originating from a microorganism belonging to acetic acid bacteria including the genus Gluconobacter and the genus 
Acetobacter, a DNA molecule comprising said nucleotide sequence as well as a combination of the said DNA with a 
DNA comprising a nucleotide sequence of a protein functional in activating the said SLDH in vivo, expression vectors 

35 carrying the DNA comprising SLDH nucleotide sequence or the said combination of the DNAs; recombinant organisms 
carrying the expression vectors; a process for producing the recombinant SLDH; and a process for producing L-sorbose 
by utilizing the recombinant SLDH or the recombinant organisms. 

[0006] The present invention is also directed to functional derivatives of the present case. Such functional derivatives 
are defined on the basis of the amino acid sequences of the present invention by addition, insertion, deletion and/or 

40 substitution of one or more amino acid residues of such sequences wherein such derivatives still have SLDH activity 
measured by an assay known in the art or specifically described herein. Such functional derivatives can be made either 
by chemical peptide synthesis known in the art or by recombinant means on the basts of the DNA sequences as dis- 
closed herein by methods known in the state of the art and disclosed e.g. by Sambrook et al. (Molecular Cloning, Cold 
Spring Harbour Laboratory Press, New York, USA, second edition 1989). Amino acid exchanges in proteins and pep- 

45 tides which do not generally alter the activity of such molecules are known in the state of the art and are described, for 
example, by H. Neurath and R. L. Hill in dThe Proteins6 (Academic Press, New York, 1979, see especially Figure 6, 
page 14). The most commonly occurring exchanges are: Ala/Ser, Val/lle. Asp/Glu, Thr/Ser, Ala/Gly, Ala/Thr, Ser/Asn, 
Ala/Val, Ser/Gly, Tyr/Phe, Ala/Pro, Lys/Arg, Asp/Asn, Leu/lle, Leu/Val, Ala/Glu, Asp/Gly as well as these in reverse. 
[0007] Furthermore the present invention is directed to DNA sequences encoding the polypeptides with SLDH activity 

so and polypeptides functional in activating the said SLDH in vivo as disclosed e.g. in the sequence list as SEQ ID NO:2 
and NO:3 as well as their complementary strands, or those which include these sequences, DNA sequences which 
hybridize under standard conditions with such sequences or fragments thereof and DNA sequences, which because of 
the degeneration of the genetic code, do not hybridize under standard conditions with such sequences but which code 
for polypeptides having exactly the same amino acid sequence. 

55 [0008] OStandard conditions^ for hybridization mean in this context the conditions which are generally used by a man 
skilled in the art to detect specific hybridization signals and which are described, e. g. by Molecular Cloning, Cold Spring 
Harbour Laboratory Press, New York, USA, second edition 1 989, or preferably so called stringent hybridization and 
stringent washing conditions a man skilled in the art is familiar with and which are described, e. g. in Sambrook et al. 
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(s. a.). 

In the following a brief description of the drawings is given. 

Figure 1 illustrates the partial amino acid sequences of SLDH polypeptide and oligonucleotides- The amino acid 
sequences in boldface in the figure were used for synthesizing two oligonucleotide sequences (S7 and S6R) for 
PGR. Arrow shows direction of DNA synthesis. All primers were degenerate DNA mixtures having bias for 
Gluconobacter codon usage. 

Figure 2 illustrates a restriction map of the SLDH gene cloned in the present invention and the genetic structure of 
the DNA region encoding SLDH and ORF2. 

Figure 3 illustrates the nucleotide sequence encoding SLDH and ORF2 with upstream and downstream sequences 
and illustrates deduced amino acid sequences of SLDH and ORF2. Figure 3 also illustrates putative ribosome- 
binding sites (SD sequences) of SLDH and ORF2 genes and the possible transcription terminator sequences. 

Figure 4 illustrates the steps for constructing plasmids pTNB114, pTNB115, and pTNBH6 for the expression of 
SLDH and/or ORF2 genes in E. coli under the control of lac promoter. 

Figure 5 illustrates the steps for constructing plasmids pTNB1 10 which carries ORF2 and SLDH genes under the 
control of common pA promoter (described below in Example 6) and pTNB143 which carries ORF2 gene under the 
control of pA located adjacent to the gene in the upstream and SLDH gene under the control of another pA pro- 
moter located adjacent to the SLDH gene in its upstream. 

[0009] The inventors have isolated SLDH gene together with a gene functionable in developing SLDH activity in vivo 
by DNA recombinant techniques and determined the nucleotide sequences. 

[0010] The present invention provides DNA sequences encoding Gluconobacter SLDH and a gene functionable in 
developing SLDH activity in vivo herein after referred to as ORF2 gene, expression vectors containing the said DNA for 
SLDH and host cells carrying the said expression vectors. 

[0011] Briefly, the SLDH and/or ORF2 gene(s), the DNA molecule containing the said gene(s), the recombinant 
expression vector and the recombinant organism utilized in the present invention can be obtained by the following 
steps: 

(1) Isolating chromosomal DNA from the microorganisms which can provide SLDH activity that converts D-sorbitol 
to L-sorbose and constructing the gene library with the chromosomal DNA in an appropriate host cell, e. g. E. coli. 

(2) Cloning SLDH and/or ORF2 gene(s) from a chromosomal DNA by colony-, plaque-, or Southern-hybridization, 
PCR (polymerase chain reaction) cloning, Western-blot analysis and the like. 

(3) Determining the nucleotide sequence of the SLDH and/or ORF2 gene(s) obtained as above by conventional 
methods to select DNA molecule containing said SLDH and/or ORF2 gene(s) and constructing the recombinant 
expression vector on which SLDH and/or ORF2 gene(s) can express efficiently. 

(4) Constructing recombinant organisms carrying SLDH and/or ORF2 gene(s) by an appropriate method for intro- 
ducing DNA into host cell, e. g. transformation, transduction, transconjugation and/or el ectropo ration. 

[0012] The materials and the techniques used in the above aspect of the present invention are exemplified in details 
as follows: 

[001 3] A total chromosomal DNA can be purified by a procedure well known in the art. The aimed gene can be cloned 
in either plasmid or phage vectors from a total chromosomal DNA typically by either of the following illustrative methods: 

(i) The partial amino acid sequences are determined from the purified proteins or peptide fragments thereof. Such 
whole protein or peptide fragments can be prepared by the isolation of such a whole protein or by peptidase-treat- 
ment from the gel after SDS-polyacrylamide gel electrophoresis. Thus obtained protein or fragments thereof are 
applied to protein sequencer such as Applied Biosystems automatic gas-phase sequencer 470A. The amino acid 
sequences can be utilized to design and prepare oligonucleotide probes and/or primers with DNA synthesizer such 
as Applied Biosystems automatic DNA sequencer 381 A. The said probes can be used for isolating clones carrying 
the objective gene from a gene library of the strain carrying the objective gene with the aid of Southern-, colony- or 
plaque-hybridization. 
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(ii) Alternatively, for the purpose of selecting clones expressing aimed protein from the gene library, immunological 
methods with antibody prepared against the aimed protein can be applied. 

(iii) The DNA fragment of the aimed gene can be amplified from the total chromosomal DNA by PCR method with 
a set of primers, i.e. two oligonucleotides synthesized according to the amino acid sequences determined as 
above. Then a clone carrying the aimed-whole gene can be isolated from the gene library constructed, e.g. in E. 
coli by Southern-, colony-, or plaque-hybridization with the PCR product obtained above as the probe. 

(iv) A further alternative way of the cloning is screening of the clone complementing SLDH-deficient strain con- 
structed by conventional mutation with chemical mutagenesis or by recombinant DNA techniques e.g. with trans- 
poson Tn5 to disrupt aimed gene. 

[001 4] DNA sequences which can be made by the polymerase chain reaction by using primers designed on the basis 
of the DNA sequences disclosed therein by methods known in the art are also an object of the present invention. It is 
understood that the DNA sequences of the present invention can also be made synthetically as described, e.g. in EP 
747 483. 

[0015] Above mentioned antibody can be prepared with purified SLDH protein or its peptide fragment as an antigen 
by such method described in Methods in Enzymology, vol. 73, p 46, 1981. 

[001 6] Once a clone carrying the desired gene is obtained, the nucleotide sequence of the aimed gene can be deter- 
mined by a well known method such as dideoxy chain termination method with M13 phage (Sanger F. et al., Proc. Natl. 
Acad. Sci. USA, 74:5463-5467, 1977). 

[001 7] The desired gene expressing the D-sorbitol dehydrogenase activity of the present invention is illustrated in Fig. 
2 and Fig. 3. This specific gene encodes the SLDH enzyme having 716 amino acid residues together with a signal pep- 
tide of 24 amino acid residues. The inventors found an open reading frame just upstream of the above SLDH structure 
gene and designated it as ORF2. This ORF2 gene encodes a protein having 126 amino acid residues, and was sug- 
gested to be functional in providing the desired enzymatic activity to the recombinant SLDH of the present invention, in 
particular when the said SLDH is expressed in a host cell of a different genus from acetic acid bacteria. 
[0018] To express the desired gene/nucleotide sequence isolated from acetic acid bacteria including genus 
Gluconobacter and genus Acetobacter efficiently, various promoters can be used; for example, the original promoter of 
the gene, promoters of antibiotic resistance genes such as kanamycin resistant gene of Tn5 (D. E. Berg, and C. M. 
Berg. 1983. Bio/Technology 1 :41 7-435), ampicillin resistant gene of pBR322, and p-galactosidase of E. coli (lac), trp-, 
tac-, trc-promoter, promoters of lambda phage and any promoters which can be functional in a host organism. For this 
purpose, the host organism can be selected from microorganisms including bacteria such as Escherichia coli, Pseu- 
domonas putida, Acetobacter xylinum, Acetobacter pasteurianus, Acetobacter aceti, Acetobacter hansenii, and Glu- 
conobacter albidus* Gluconobacter capsulatus, Gluconobacter cerinus, Gluconobacter dioxyacetonicus, 
Gluconobacter gluconicus , Gluconobacter industrius , Gluconobacter mefanogenus , Gluconobacter nonoxygluconicus , 
Gluconobacter oxydans, e.g. Gluconobacter oxydans DSM 4025, which had been deposited as DSM 4025 on March 
17, 1987 under the conditions of the Budapest Treaty at the Deutsche Sammfung von Mikroorganismen und Zellkul- 
turen GmbH, Braunschweig, BRD. Gluconobacter oxydans subsp. sphaericus, Gluconobacter roseus, Gluconobacter 
rubiginosus, Gluconobacter suboxydans, mammalian cells and plant cells. 

[0019] For expression, other regulatory elements, such as a Shine-Dalgarno (SD) sequence (for example, AGGAGG 
etc. including natural and synthetic sequences operable in the host cell) and a transcriptional terminator (inverted 
repeat structure including any natural and synthetic sequence operable in the host cell) which are operable in the host 
cell into which the coding sequence will be introduced can be used with the above described promoters. 
[0020] For the expression of membrane-bound polypeptides, like the SLDH protein of the present invention, a signal 
peptide, which contains usually 1 5 to 50 amino acid residues and is totally hydrophobic, is preferably associated. A DNA 
encoding a signal peptide can be selected from any natural and synthetic sequence operable in the desired host cell. 
[0021 ] A wide variety of host/cloning vector combinations may be employed in cloning the double-stranded DNA. The 
cloning vector is generally a plasmid or phage which contains a replication origin, regulatory elements, a cloning site 
including a multi-cloning site and selection markers such as antibiotic resistance genes including resistance genes for 
ampicillin, tetracycline, kanamycin, streptomycin, gentamicin, spectinomycin etc. 

[0022] Preferred vectors for the expression of the gene of the present invention in E. coli is selected from any vectors 
usually used in E. coli, such as pBR322 or its derivatives including pUC18 and pBluescript II (Stratagene Cloning Sys- 
tems. CA, USA), pACYC177 and pACYC184 (J. Bacteriol., 134:1141-1156, 1978) and their derivatives, and a vector 
derived from a broad host range plasmid such as RK2 (C. M. Thomas, Plasmid 5: 10, 1981) and RSF1010 (P. Guerry 
et al., J. Bacteriol. 1 1 7: 61 9-630, 1 974). A preferred vector for the expression of the nucleotide sequence of the present 
invention in bacteria including Gluconobacter, Acetobacter and P. putida is selected from any vectors which can repli- 
cate in Gluconobacter, Acetobacter and/or P. putida, as well as in a preferred cloning organism such as £. coli. The 
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preferred vector is a broad-host-range vector such as a cosmid vector like pVK100 (V. C. Knauf et ah, Plasmid 8: 45- 
54, 1982) and its derivatives and RSF1010. Copy number and stability of the vector should be carefully considered for 
stable and efficient expression of the cloned gene and also for efficient cultivation of the host cell carrying the cloned 
gene. DNA molecules containing transposable elements such as Tn5 can be also used as a vector to introduce the 
desired gene into the preferred host, especially on a chromosome. DNA molecules containing any DNAs isolated from 
the preferred host together with the gene of the present invention is also useful to introduce this gene into the preferred 
host, especially on a chromosome. Such DNA molecules can be transferred to the preferred host by applying any of a 
conventional method, e.g. transformation, transduction, transconjugation or electroporation, which are well known to 
those skilled in the art, considering the nature of the host and the DNA molecule. 

[0023] Useful hosts may include microorganisms, mammalian cells, and plant cells and the like. As a preferable micro- 
organism, there may be mentioned bacteria such as E. co//, P. putida, A. xylinum, A. pasteurianus, A. aceti. A. 
hansenii, Gluconobacter albidus, Gluconobacter capsulars, Gluconobacter cerinus, Gluconobacter dioxyacetonicus, 
Gluconobacter gluconicus , Gluconobacter industhus , Gluconobacter meianogenus , Gluconobacter nonoxygfuconicus , 
Gluconobacter oxydans, Gluconobacter oxydans subsp. sphaericus, Gluconobacter roseus, Gluconobacter 
rubiginosus, Gluconobacter suboxydans, and any bacteria which are capable of expressing recombinant SLDH and/or 
ORF2 gene(s). Functional equivalents, subcultures, mutants and variants of said microorganism can be also used in 
the present invention. A preferred strain is E. coli K12 or its derivatives, P. putida, Gluconobacter, or Acetobacter 
strains. 

[0024] The SLDH and/or ORF2 gene(s)/nucieotide sequences provided in this invention are ligated into a suitable 
vector containing a regulatory region such as a promoter, a ribosomal binding site and a transcriptional terminator oper- 
able in the host cell described above with a well-known methods in the art to produce an expression vector. When the 
SLDH and ORF2 genes are cloned in combination, the two genes can be cloned either in tandem or separately on the 
same plasmid and also on the chromosomal DNA. Figures 4 and 5 exemplifies the form of the combined cloning of 
SLDH and ORF2 genes on the plasmid. One may clone also one gene on a plasmid and the other one on chromosomal 
DNA. 

[0025] To construct a recombinant microorganism carrying a recombinant expression vector, various gene transfer 
methods including transformation, transduction, conjugal mating (Chapters 14 and 15, Methods for general and molec- 
ular bacteriology, Philipp Gerhardt et al. ed., American Society for Microbiology, 1994), and electroporation can be 
used. The method for constructing a recombinant organism may be selected from the methods well-known in the field 
of molecular biology. Usual transformation systems can be used for E. coli, Pseudomonas, Gluconobacter and 
Acetobacter. A transduction system can also be used for E. coli. Conjugal mating system can be widely used in Gram- 
positive and Gram-negative bacteria including E. coli, R putida and Gluconobacter. A preferred conjugal mating is dis- 
closed in WO89/06688. The conjugation can occur in liquid medium or on a solid surface. The preferred recipient for 
SLDH and/or ORF2 production is selected from E. coli, R putida, Gluconobacter and Acetobacter. To the recipient for 
conjugal mating, a selective marker is usually added; for example, resistance against nalidixic acid or rifampicin is usu- 
ally selected. Natural resistance can be also used; e.g. resistance against polymyxin B is useful for many 
Gluconobacters. 

[0026] The present invention provides recombinant SLDH. One can increase the production yield of the SLDH 
enzyme by introducing the gene of SLDH provided by the present invention into organisms including acetic acid bacte- 
ria including the genus Gluconobacter and the genus Acetobacter. One can also produce active SLDH in microorgan- 
isms beside Gluconobacter by using the SLDH gene of the present invention in combination with the ORF2 gene of the 
present invention. The recombinant SLDH can be immobilized on a solid carrier for solid phase enzyme reaction? The 
present invention also provides recombinant organisms. One can produce L-sorbose from D-sorbitol with the recom- 
binant organisms. One can also produce L-sorbose from D-sorbitol even in a host organism beside acetic acid bacteria 
by introducing the SLDH gene in combination with the ORF2 gene of the present invention. 

Examples 

Example 1. Determination of amino acid sequences of SLDH 

(1) N-terminal amino acid sequence analyses of lysylendopeptidase-treated SLDH polypeptides 

[0027] Partial amino acid sequences of peptides No. 1 , 3 and 8 prepared from SLDH protein were determined (Fig. 1 ; 
SEQ ID NO:4 to 6). Purified SLDH of G. suboxydans IFO 3255 (T. Hoshino et al., EP 728840) was digested with 
lysylendopeptidase; the reaction mixture (25 ml) contained 1 .2 mM of lysytendopeptidase and 3 nmol of the SLDH pro- 
tein in 100 mM Tris-HCI buffer, pH9.0, and the reaction was carried out at 37°C for 15 hours. The resulting peptide frag- 
ments were separated by HPLC (column : protein C-4, VYDAC, California, USA) with acetonitrile/isopropanol gradient 
in 0. 1% TFA at a flow rate of 1 .0 ml/min. Elution of the peptides was monitored by UV absorbance at 21 4 nm, and the 
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peak fractions were collected manually and subjected to N-terminal amino acid sequence analysis with an amino acid 
sequencer (Applied Biosystems model 470 A, The Perkin Elmer Corp., Conn., USA). 

Example 2. Cloning of partial SLDH gene by PCR 

5 

[0028] PCR was conducted with chromosomal DNA of G. suboxydans IFO 3255 and degenerate oligonucleotide DNA 
primers, s7 and s6R whose sequences are shown in Fig. 1. The PCR amplification was carried out with thermostable 
polymerase Amplitaq (PerWn-Elmer, Roche Molecular Systems Inc., NJ, USA), using a thermal cycler (ZYMO REAC- 
TOR II AB-1820, ATTO, Tokyo, Japan). The following reaction mixture (50 nl) was used for PCR: 200 jiM of dNTPs, 10 

io - 20 pmol of each primer (54 - 288 degeneracy), 6 ng of chromosomal DNA of G. suboxydans IFO 3255 and 0.5 unit 
of the DNA polymerase in the buffer provided from the supplier. The reaction consisted of 30 cycles of 1) denaturation 
step at 94°C for 1 min; 2) annealing step at 48°C for 1 min; 3) synthesis step at 72°C for 2 min. Consequently, 1.6 kb 
DNA was amplified and cloned in E. coli vector, pUC57/T (MBI Fermentas, Vilnius, Lithuania), which has 30-ddT-tailed 
ends for direct ligation of an amplified DNA fragment to obtain a recombinant plasmid pMT20. The cloned DNA was 

is subjected to nucleotide sequencing by the method of dideoxy-chain termination (F. Sanger et al, Proc. Natl. Acad. Sci. 
USA. 74:5463-5467, 1977); the 1 .6 kb fragment encoded the peptides No. 3 and No. 8. 

Example 3. Complete cloning of the SLDH gene 

20 (1) Construction of gene library of G. suboxydans IFO 3255 

[0029] The chromosomal DNA of G. suboxydans IFO 3255 was prepared from the cells grown on MB agar plate for 
2 days. The chromosomal DNA (160 ng) was partially digested with 20 units of Eco Rl in 500 jxJ of reaction mixture. 
Portions of the sample were withdrawn at 5, 10, 15, 30, and 60 minutes and the degree of the digestion was detected 
25 by agarose gel electrophoresis. Former four portions which contained partially-digested DNA fragments were combined 
and subjected to preparative gel electrophoresis (agarose: 0.6%). Fragments of 15 - 35 kb were cut out and electroe- 
luted from the gel. The eluate was filtered and precipitated with sodium acetate and ethanol at -80°C. The DNA frag- 
ments were collected by centrifugation and suspended in 200 u,l of 10 mM Tris-HCI, pH8.0, buffer containing 1 mM 
EDTA. 

so [0030] In parallel, 1 .8 yug of a cosmid vector pVK1 00 was completely digested with Eco Rl and treated with calf intes- 
tine alkaline phosphatase. The linearized and dephosphorylated pVK100 was ligated with 15 - 35 kb Eco Rl fragments 
of the chromosomal DNA of G. suboxydans IFO 3255 (5 ng) with the ligation kit (Takara Shuzo, Kyoto, Japan) in 20 sep- 
arate tubes under the condition recommended by the supplier to obtain highly polymerized DNA. The ligated DNA was 
then used for in vitro packaging according to the method described by the supplier (Amersham Japan). The resulting 

35 phage particles were used to infect E. coli ED8767, a host for the genomic library. Consequently, 4,271 colonies were 
obtained and all of the colonies tested (20 colonies) possessed the insert DNAs with the average size of about 25 kb. 

(2) Colony hybridization to obtain complete SLDH gene 

40 [0031 ] The cosmid library described above was screened to isolate the clone carrying complete SLDH gene by colony 
hybridization method with 32 P-labeled 1 .6 kb DNA of pMT20. One clone was isolated and designated pSLII, which carry 
about 25 kb insert in pVKI 00 vector. From the pSLII, 6.2 kb Pst\ fragment was isolated and cloned into pUC1 8 to obtain 
plasmid pUSLIIP. The 6.2 kb Pst I fragment containing ORF2 and SLDH genes was isolated from pUSLIIP and sub- 
cloned into pCRII vector (Invitrogen Corporation, CA, USA) to produce pCRSLIIR From the resulting plasmid, the DNA 

45 fragment containing the 6.2 kb Pst I fragment was isolated as Hind \W-Xho I fragment and cloned between Hind III and 
Xho I sites of pVKlOO to obtain pVKSLIIP. 

(3) Expression of SLDH gene in E. coli 

so [0032] To confirm whether 6.2 kb Pst l-fragment encodes the aimed SLDH, cells of E. coli JM109 carrying pUSLIIP 
was subjected to Western-blot analysis with anti-SLDH antibody as described above. Immuno-positive proteins with a 
molecular weight of about 80 kDa were observed in the transformant, indicating that the Pst l-fragment encodes the 
polypeptides with the molecular weight of the intact SLDH (79 kDa +/- 0.5 kDa). 

55 (4) Construction of SLDH<Jeficient Gluconobacter, strain 26A1 1 , as the test strain for SLDH-activity complementation 

[0033] Transposon Tn5 mutagenesis was performed with G. melanogenus IFO 3293 as the parent. Tn5, a transpos- 
able element coding for Kmr, causes null mutations at random on DNA of its host organism and widely used as a muta- 
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gen in Gram-negative bacteria. The IFO 3293 was selected as the parent in the following reasons: (i) it produced L- 
sorbose from D-sorbitof, pi) it showed immuno-posrtive polypeptide of about 80 kDa in Western-blot analysis with the 
antibody prepared against SLDH purified from G. subaxydans IFO 3255 and (iii) its frequency for generating Tn5 
mutants was much higher than that of G. suboxydans IFO 3255. 

5 [0034] G. melanogenus IFO 3293 was cultivated in a test tube containing 5 ml of the MB medium containing 25 g/1 of 
mannitol, 5 g/l of yeast extract (Difco Laboratories, ), 3 g/l of Bactopepton (Difco) at 30°C overnight. E. coli HB101 
(PRK2013) [D. H. FigursW, Proc, Natl. Acad. Sci. USA 76: 1648 - 1652, 1979] and E. coli HB101 (pSUP2021) [R. 
Simon, et al.. BIO/TECHNOL 1:784-791, 1983] were cultivated in test tubes containing 5 ml of LB medium with 50 
jig/ml of kanamycin at 37°C overnight. The cells was separately collected by centrifugation and suspended in the half 

w volume of MB medium. The each cell suspension was mixed in the ratio of 1 :1 :1 and the mixture was placed on the 
nitrocellulose filter on the surface of MB agar plate. The plate was incubated at 27°C overnight and the resulting cells 
on the filter was scraped, suspended in the appropriate volume of MB medium and spread on the selection plate (MPK 
plate), MB containing 10 fig/ml of polymyxin B and 50 iig/ml of kanamycin. The MPK plate was incubated at 27°C for 3 
to 4 days. 

is [0035] The resulting 3,436 Tn5 mutants were subjected to the immuno-dot Wot screening with the anti-SLDH anti- 
body. The cells of each strain were independently suspended in 50 \l\ Laemmli buffer consisting of 62.5 mM Tris-HCI 
(pH6.5), 10% glycerol, 2% SDS and 5% p-mercaptoethanol in 96-well microtiter plate and incubated at 60°C for 2 
hours. The Cell Free Extracts (CFEs) were stamped on the nitrocellulose filter and the immuno-positive samples in the 
CFEs were screened with AP conjugate substrate Kit (Bio-RAD Laboratories, Richmond, Calif., USA). As a result, only 

20 one strain 26A1 1 without positive signal in the immuno-dot blot screening was obtained. 

[0036] SLDH-deficiency of the strain 26A11 was confirmed by Western-blot analysis; it expressed at most 1/500 
amount of SLDH compared with its parent strain. 26A1 1 was not a complete SLDH-deficient strain but the strain with 
SLDH gene repressed by Tn5 insertion; the insertion site was found to be close to the C-terminus by determining the 
nucleotide sequence around Tn5-insertion point. Next, a resting cell reaction was conducted to examine the whole 

25 SLDH activity in 26A1 1 and the wild Gluconobacter strains. In the potassium phosphate buffer 100 mM (pH7.0) con- 
taining 2% D-sorbitol, 26A11 slightly converted D-sorbitol to L-sorbose, whereas the wild strains IFO 3293 and IFO 
3255 completely did it in 39.5 hr at 30°C. 

(5) Expression of SLDH gene in 26A1 1 

30 

[0037] To confirm the SLDH activity of the SLDH clones obtained, complementation test was conducted. Plasmids 
pSLII and pVKSLIIP were introduced into 26A1 1 by a conjugal mating. The transconjugant carrying pSLII or pVKSLIIP 
restored the activity of SLDH in a mini-resting cell reaction and showed immuno-reactive polypeptide of about 80 kDa 
in Western-blot analysis. 

35 

(6) Nucleotide sequencing of the SLDH gene 

[0038] Plasmid pUSLIIP was used for nucleotide sequencing of SLDH and ORF2 genes. Determined nucleotide 
sequence (SEQ ID NO: 1 ; 3,481 bp) revealed that ORF of SLDH gene (2,223 bp, nucleotide No. 572 to 2794 in SEQ 
40 ID NO: 1) encoded the polypeptide of 740 amino acid residues (SEQ ID NO: 2), in which there were three amino acid 
sequences (Peptides No. 1 , 3, and 8 shown in SEQ ID NO: 4 to 6) determined from the purified SLDH polypeptide. In 
addition to the SLDH ORF, one more ORF, ORF2, was found just upstream of SLDH ORF as illustrated in Figs. 2 and 
3. The ORF of ORF2 (381 bp, nucleotide No. 192 to 572 in SEQ ID NO: 1) encoded the polypeptide of 126 amino acid 
residues (SEQ ID NO. 3). 

45 [0039] The 4 th amino acid sequence of Peptide No.1 was determined as Glu by the amino acid sequencer, but it was 
Ala according to the DNA sequence. The 1 1 th amino acid sequence of Peptide No.3 was determined as Gin by the 
amino acid sequencer, but it was Pro according to the DNA sequence. A signal peptide-like region (SEQ ID NO: 8) is 
possibly included in the deduced amino acid sequence: it contains (i) many hydrophobic residues, (ii) a positively- 
charged residues near N-terminus, and (iii) Ala-Xaa-Ala site as a cleaved signal. The actual signal sequence was deter- 
so mined as described in Example 3 (7). The putative ribosome-binding site (Shine-Dalgarno, SD, sequence) for SLDH 
gene was located at 8 bp upstream of the initiation codon (AGAGGAG at nucleotide No. 558 - 564 of Seq ID NO: 1). 
The putative SD sequence for ORF2 gene was located at 10 bp upstream of the initiation codon (GGGAGG at nucle- 
otide No. 1 77 to 1 82 of Seq ID NO: 1 ). There were some inverted repeat sequences immediately downstream the SLDH 
structure gene (nucleotide sequences of No. 2803 - 2833 and of No. 2838 - 2892) as illustrated in Fig. 3; they may 
55 action as transcription termination loops for SLDH gene. For ORF2 gene, the inverted repeat sequence was found at 
No. 684 - 704. 

[0040] Homology search for SLDH and ORF2 was performed with the programs of mp Blast (NCBI, Bethesda, Md. 
USA) and Motifs in GCG (Genetics Computer Group, University Research Park. Wl, USA). SLDH polypeptide had the 
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sequence commonly conserved in quinoprotein at the region near C-terminus (within amino acid residue No. 632 to 692 
of SEQ ID No. 2) and the other sequence identified as a quinoprotein motifs (Prosite No. PS00363: 
tDN]W.{3}G[RK].{6}[FY]S.{4}[LIVM]N.{2}NV.{2}L[RK]; amino acid residue No. 79 to 107 of SEQ ID No. 2). 
[0041 ] ORF2 showed homology with the N-terminal region of the membrane-bound PQQ-dependent D-glucose dehy- 
drogenase (GDH) of G. oxydans, E. coll, Acinetobactercalcoaceticus, which is known as a membrane-spanning region 
to bind the GDH to the membrane. Identities of ORF2 to the N-terminal region of the GDHs of G. oxydans, E. coli, A. 
calcoaceticus were 30%, 32%, and 37%, respectively. The ORF2 protein may function as an anchoring protein to make 
the SLDH membrane-bound type. 

Example 4. Determination of N-terminal and C-termlnal sequences of mature SLDH polypeptide 

[0042] Direct sequencing of the N-terminus gave no results, indicating that the N-terminus is blocked. Then, SLDH 
polypeptide was treated with the endoproteinase Lys-C (Wako, Osaka Japan ) in 0.1 M Tris-HCI at 37°C for 20 hours 
with a substrate-to-enzyme ratio of 20:1 (w/w). Total digest was analyzed by reversed phase HPLC (RP300, 1 mm x 25 
cm, Applied Biosystems, Foster City, CA) and each peak was subjected to mass spectrometer (TSQ700 triple quadru- 
pole instrument, Finnigan-MAT, San Jose, California) for determining molecular weight. One of the digest described as 
SEQ ID No. 7 was assigned as the N-terminal sequence by the mass spectrometry analysis and amino acid composi- 
tion analysis together with the amino acid composition predicted from determined nucleotide sequence shown as SEQ 
ID NO: 1. Further analysis with collisional induced dissociation (CID) was carried out to confirm the identity of the pep- 
tide with N-terminal sequence. The N-terminus was determined to be pyroglutamyl residue. 

[0O43] Since the N-terminus of SLDH was determined to be Gln-Phe-Ala-Pro-Ala-Gly-Ala-Gly-Gly-Glu-Pro-Ser-Ser- 
Ser-Val-Pro-Gly-Pro-Gly-Asn-Ala-Ser-Glu-Pro-Thr-Glu-Asn-Ser-Pro-Lys as shown in SEQ ID NO. 7, the signal 
sequence was confirmed to be 24 amino acid residue long with the sequence of Met-Arg-Arg-Pro-Tyr-Leu-Leu-Ala-Thr- 
Ala-Ala-Gly-Leu-Ala-Leu-Ala-Cys-Ser-Pro-Leu-lle-Ala-His-Ala as listed as SEQ ID NO: 8. The C-terminai sequence 
was also determined by using the peptide recovered from V8 protease digest to be Pro-Asp- Ala- lle-Lys-GIn (SEQ ID 
NO: 9). 

Example 5. Expression of the SLDH and/or ORF2 gene(s) In E. coli 

[0044] From the pCRSLIIP described in Example 3 (2), plasmids carrying SLDH gene with or without ORF2 gene 
under the lac promoter control were constructed as illustrated in Fig. 4. The resulting three plasmids are pTNB1 14 car- 
rying SLDH and ORF2 genes, pTNB1 15 carrying SLDH gene and ORF2 gene truncated at its N-terminus containing 
ribosome binding site and start codon (ATG) and pTNB116 carrying mostly-truncated ORF2 gene and intact SLDH 
gene. These three plasmids were introduced into E. coli by conventional transformation. The production of the SLDH 
polypeptide was detected by Western-biot analysis with cell free extracts of the resulting transformants and the SLDH 
activity was assayed with resting cells. The resting cell reaction was carried out in the reaction mixture consisting of 
0.3% NaCI, 1% CaC0 3 , 4% D-sorbitol, and 1 mM PMS with or without 1 jig/ml of PQQ at room temperature for 17 
hours. The SLDH activity to produce L-sorbose was analyzed by TLC assay with Silica gel 60 F 254 , 0.25 mm. Merck 
with the developing solvent consisting of n-propanol-H 2 0-l% H 3 P0 4 -HCOOH (400:100:10:1) and spray reagent of 
naphthoresorcinol. Consequently, SLDH polypeptide was detected in all transformants, even without ORF2 gene 
expression, in Western-blot analysis, but SLDH activity to produce L-sorbose was detected only in transformant carry- 
ing pTNB1 14 containing intact ORF2 gene under the resting cell reaction condition in the presence of PQQ. 

Example 6. Expression of the SLDH and/or ORF2 gene(s) in £. coli 

[0045] Figure 5 illustrates construction steps of pTNB1 1 0 and pTNB1 43. Plasmid pTNB1 1 0 carrying ORF2 and SLDH 
genes under control of the promoter of Enzyme A gene (pA) of DSM4025 (T. Hoshino et al., European Patent Applica- 
tion No. 9611500.8) was constructed by inserting Hind \U-Kpn I fragment containing pA, Kpn \-Xho I fragment from 
pTNB114 between Hind \W-Xho I of pUC18. Plasmid pTNB143 carrying ORF2 and SLDH genes as independent 
expression units, ORF2 gene with pA and SLDH gene with pA, was constructed by inserting 0.9 kb Sal I fragment from 
pTNB141 and 3.2 kb Hind \\\-Xho I fragment from pTNB135 into pUC57/pCRII hybrid vector (Sea \-Hind III fragment 
with plac from pUC57 and Xho I to Sea I fragment without plac from pCRII) as shown in Fig. 5. The plasmids, pTNB1 1 0 
and pTNB143, were introduced into E. coli by conventional transformation. The resulting transformants were subjected 
to Western-blot analysis and resting cell reaction as described in Example 5. Consequently, both of the transformants 
carrying pTNB110 and pTNB143 produced SLDH protein and showed SLDH activity to produce L-sorbose from D- 
sorbitoi in the presence of PQQ. 
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Example 7. Expression of the SLDH gene In G. oxydans DSM 4025 

[0046] The ptasmid pTNB136 for the expression of SLDH gene in G. oxydans DSM 4025 having strong L-sorbose 
and L-sorbosone dehydrogenase activities together with weak D-sorbitol dehydrogenase activity to produce L-sorbose, 

5 was constructed by inserting Hind \W-Xho I fragment from pTNBl35 (Fig. 5) between Hind III and Xho I sites of 
pVK100. The plasmid pTNB136 and its vector pVK100 were introduced by conjugal mating into strain GOBAK, which 
is a mutant of G. oxydans DSM 4025 whose gene of Enzyme B (EP 832 974) having D-sorbitol dehydrogenase to pro- 
duce L-sorbose is deleted by replacing two Eco Rl fragments containing Enzyme B gene with kanamycin resistant gene 
cassette (1.28 kb Eco Rl fragment of pUC4K; Pharmacia Uppsala, Sweden). The gene disruption was conducted by 

70 the recombinant DNA techniques well-known in the art. The resulting transconjugant, GOBAK carrying pTNB136 or 
pVK1 00, produced 5 g/L or below 2 g/I of 2 KG A in 1 0% D-sorbitol, respectively, by flask fermentation conducted at 30°C 
for 4 days (medium: 10% D-sorbitol, 1.6% urea. 0.05% glycerol, 0.25% MgS04_7H 2 0, 3% corn steep liquor, 6.25% 
baker's yeast wet cells and 1 .5% CaC0 3 ). Western-blot analysis with anti-SLDH antibody revealed that the transconju- 
gants carrying pTNB136 expressed the immuno-reactive SLDH polypeptides. 

15 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 

(i) APPLICANT 

NAME: F. HOFFMANN-LA ROCHE AG 

STREET: Grezacherstrasse 124 

CITY: Basle 

COUNTRY: Switzerland 

POSTAL CODE: CH-4002 

TELEPHONE: 061 - 688 25 11 

FAX: 061-68813 95 

TELEX: 962292/965542 hlr c 

(ii) TITLE OF INVENTION: 
D-Sorbitol dehydrogenase gene 

(iii) NUMBER OF SEQUENCES: 9 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: Macintosh 

(C) OPERATING SYSTEM: 

(D) SOFTWARE: MS word ver 5.1a 
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CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION 
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(2) INFORMATION FOR SEQ ID NO:l: 



10 



(i) 



SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 

(B) TYPE: 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



3481 base pairs 
nucleic acid 



20 



25 



30 



35 



(ii) MOLECULE TYPE: 

(iii) ORIGINAL SOURCE: 

ORGANISM: 
STRAIN: 

(iv) FEATURE: 

FEATURE KEY: 
POSITION: 

SEQUENCING METHOD: 
FEATURE KEY: 
POSITION: 

SEQUENCING METHOD: 



DNA (genomic) 

Gluconobacter suboxydans 
IFO 3255 

CDS 

192..572 

E 

CDS 

572..2794 
E 



45 



50 



ACAAATCATA CTGGCGGCGC TGTAGTGACA ATTCCGGCGG GTTAAAGAGA ATATTTTTTT 60 

GGTGACAGGC CACAACAAAT TTTTGTTACC TCAAACACAG TTTTGTTAGA GCATTTGAAA 120 

ACGAAGTCCG ATGGACCTGA ACTGAATATG GATTTACCGT CCGGAGGATT CAGTTTGGGA 18 0 

GGCATTCGGT TATGCCAAAT CTTCAAGGTA ATAGGACTCT GACGGAGTGG CTGACGCTGC 24 0 
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TTCTCGGGGT CATCGTCCTT CTTGTGGGCC 
CGATGCTGGG CGGCTCTACC TACTATGTTC 
TATTCATGCT CATGGGCCGC ACGCTTGGTG 
CGTGGGTCTG GTCCTTCTGG GAAGTCGGTT 
TCGGCCCGAC CATCCTTGGC ATTCTCGTTG 
AAAGCCGTCG TACTCTCAGA GGAGCCGTCT 
CGCAGGACTC GCCCTTGCCT GTTCGCCGCT 
GGCTGGCGGC GAACCTTCCT CGTCAGTTCC 
AAACTCTCCG AAAAGTCAGA GCTACTTCGC 
TGGCGTAAAC GCAGCCAACC TGCCGGACAT 
CATGGCTCCG CAGCAGAGTG CCAATCCGGC 
CGATCATCAG ACGCGATACT CTCCGCTTTC 
CAAGGTCGCT TTCGTCTACC ACACGGGGAG 
GGCCGCCGAA ACCACGCCGA TCAAGGTTGG 
CGACATCATC AAGCTGGATC CGGCTACGGG 
CAAATACCAC TCCATTCCCT ATACCGCTGC 
CGTGGTGCCG GAAGGCCAGC CCTGCCACAA 
TCTGATTGCG GTTGACGCGG AGACAGGGGA 
GGTCAACCTG ATGCAGGGTC TGGGTGAGTC 
TCCACCGGTC ATCAACGGCG TCGTGGTTGT 
CTGGGCTCCG TCCGGTGTGA TCCGTGGTTA 
CTGGGACGTC AACAATTCCG GACGATCACA 
CGTGGAACGC CGAATTCCTG GGCTACCTGA 
TCCCGACAGG AACTCTGCTG CTGACTATTA 



TGTTCTTCGT CATTGGGGGT GCTGACCTCG 300 
TCTGTGGCAT CCTCCTGGTT GCTAGCGGCG 360 
CCTTCCTGTA TCTGGGTGCC CTGGCCTACA 420 
TCAGCCCCAT CGATCTTCTG CCCCGCGCTT 480 
CCCTGACCAT TCCGGTCCTG CGCCGCATGG 540 
GATGCGCCGG CCTTACCTTC TAGCAACAGC 600 
CATCGCTCAT GCACAGTTTG CTCCCGCAGG 660 
TGGGCCAGGA AATGCGAGCG AGCCCACCGA 72 0 
AGGACCGTCG CCCTATGCCC CGCAGGCTCC 7 80 
TGAGTCAATC GATCCCTCGC AGGTCCCGGC 840 
ACGTGGAGAC TGGGTTGCTT ACGGACGTGA 900 
GGAAATCACG CCTGAGAACG CAAGCAAGCT 960 
TTATCCGCGT CCGGGACAGG TGAACAAATG 1020 
TGACGGTCTC TACACATGTT CCGCCATGAA 1080 
TAAGCAGATC TGGCGTCGGA ACGTGGATG T 1140 
CTGTAAGGGT GTGACGTATT TCACGTCCTC 1200 
TCGCCTTATC GAAGGCACGC TGGATATGCG 1260 
TTTCTGCCCT AATTTCGGTC ATGGTGGTCA 1320 
TGTTCCGGGC TTCGTCTCCA TGACGGCACC 13 80 
AAACCACGAA GTGCTCGACG GTCAGCGCCG 1440 
CGATGCTGAA AGTGGCAAAT TCGTATGGGC 1500 
GCCAGCCTAC CGGGTAACCG TCATTACAGC 1560 
CAGGCGACAA CGAGGAGGGT CTCGTTTACG 1620 
CAGCGCCCTG CGTAGTGATG CTGAAAACAA 1680 
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GGTGTCCTCC GCTGTTGTCG CCATTGACGT CAAGACGGGT TCTCCGCGCT GGGTCTTCCA 1740 
GACGGCTCAT AAGGACGTCT GGGATTATGA CATCGGTTCA CAGGCGACCC TGATGGATAT 1800 
GCCTGGCCCG GATGGCCAGA CGGTTCCTGC TCTCATCATG CCGACCAAGC GTGGCCAGAC 1860 
GTTCGTGCTT GACCGTCGTA CCGGCAAGCC AATTCTGCCG GTTGAAGAAC GCCCAGCTCC 1920 
GTCCCCTGGT GTTATTCCGG GTGACCCGCG TTCTCCGACG CAGCCATGGT CCGTCGGGAT 19 80 
GCCGGCCCTT CGCGTGCCGG ATCTGAAAGA GACAGACATG TGGGGTATGT CCCCCATCGA 2040 
TCAGCTCTTC TGCCGTATCA AGTTCCGCCG TGCGAACTAT GTGGGTGAGT TCACACCACC 2100 
GAGCGTTGAC AAGCCGTGGA TTGAATATCC GGGCTATAAC GGTGGCAGTG ACTGGGGCTC 2160 
CATGTCCTAT GATCCGCAGT CCGGCATCCT GATTGCGAAC TGGAACATCA CACCGATGTA 2220 
CGACCAGCTC GTAACCCGCA AGAAGGCAGA CTCCCTCGGC CTGATGCCGA TCGATGACCC 2280 
CAACTTCAAG CCAGGTGGCG GTGGTGCCGA AGGTAACGGC GCCATGGACG GAACGCCTTA 2340 
CGGTATCGTC GTGACACCGT TCTGGGATCA GTACACGGGC ATGATGTGCA ACCGTCCGCC 2400 
CTACGGTATG ATCACAGCCA TCGACATGAA GCACGGCCAG AAGGTTCTGT GGCAGCATCC 2460 
GCTCGGAACG GCTCGCGCCA ACGGTCCATG GGGTCTGCCA ACAGGTCTGC CATGGGAAAT 2520 
CGGCACTCCG AACAATGGTG GTTCGGTTGT GACCGGTGGC GGTCTGATCT TCATCGGTGC 2580 
GGCAACGGAT AACCAGATCC GCGCGATTGA TGAACACACT GGCAAGGTTG TCTGGAGCGC 264 0 
AGTCCTCCCC GGCGGCGGTC AGGCCAATCC GATGACGTAT GAAGCCAATG GTCACCAGTA 2700 
CGTTGCCATC ATGGCTGGCG GTCATCACTT CATGATGACG CCAGTGTCTG ACCAGCTTGT 27 60 
GGTTTACGCA CTGCCGGATG CCATCAAGCA GTAATTAAGT CCTGTGGCGG ATGTGTCATG 2820 
CATATCCGCC ACACTCCATC GTCAGAAGGA GACTTTCGTG CTAGCCATGC AGGGAAGTCT 2880 
CCTTTTGACG TTTTTGGCTC TTTCCAGCGA GCGGGCAGTC TGAAACGGGG CTTCGTCTGG 2940 
CTCGTACTTT CAGAATGGCT CGTCGCACCC TCATGACTGC CCACTCCCCC GTTATCTTGC 3 000 
AGGTTCTGCC AGCCCTCAGC ACGGGCGGCC TGGAGCGGGG AGCTATTGAA ATTGCGGCTG 3 060 
CCATCACACA GGCTGGTGGC AAGGCCATTG TCGCTTCGAA GACGGGTCCT CTTCTTGTGC 3120 
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AACTCCGCCA CGTCGGAGCA GTGCATGTGC CGCTGGATCT CAAATCGAAA TCGCCGTTTT 3180 
5 CTGTTCGGCG CCGTGCCCGT GAACTCCAGA AACTGATCCG GGAGCAGCAG GTTGATCTGG 3240 

TTCACGCCCG GTCCCGTATT CCGGCATGGG CCGCCTGGCT CGCCTGCCGC CGCGAGAACA 3 3 00 
10 TTCCTTTCGT GACAACGTGG CATGGCGTCC ACGAGGCTGG CTGGTGGGGC AAGAAATTCT 3360 

ACAATTCGGT GCTGGCCCGG GGTGCAAGGG TCATCGCAAT TTCGCACTAC ATTTCCGGGC 3 420 
15 GTCTTTCAGG GCAGTACGGC GTTCAGGCAG ATCGTCTTCG AACCATTCCG CGTGGTGCCG 34 80 

A 

20 



30 



35 



50 
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INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 740 residues 

(B) TYPE: amino acid 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(iii) ORIGINAL SOURCE: 

ORGANISM: Gluconobacter suboxydans 

STRAIN: IFO 3255 

(iv) FEATURE: 

FEATURE KEY: sig peptide 
POSITION: -24..-1 
SEQUENCING METHOD: E 
FEATURE KEY: mat peptide 
POSITION: 1..716 

SEQUENCING METHOD: E 



Met Arg Arg Pro Tyr 
-24 -20 
Ala Cys Ser Pro Leu 
-5 



Leu Leu Ala Thr Ala 
-15 

lie Ala His Ala Gin 
1 



Ala Gly Leu Ala Leu 
-10 

Phe Ala Pro Ala Gly 
5 
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Ala Gly Gly Glu Pro Ser Ser Ser Val Pro Gly Pro Gly Asn Ala 

10 15 20 

Ser Glu Pro Thr Glu Asn Ser Pro Lys Ser Gin Ser Tyr Phe Ala 

25 30 35 

Gly Pro Ser Pro Tyr Ala Pro Gin Ala Pro Gly Val Asn Ala Ala 

40 45 50 . 

Asn Leu Pro Asp lie Glu Ser lie Asp Pro Ser Gin Val Pro Ala 

55 60 65 

Met Ala Pro Gin Gin Ser Ala Asn Pro Ala Arg Gly Asp Trp Val 

70 75 80 

Ala Tyr Gly Arg Asp Asp His Gin Thr Arg Tyr Ser Pro Leu Ser 

85 90 95 

Glu lie Thr Pro Glu Asn Ala Ser Lys Leu Lys Val Ala Phe Val 

100 105 110 

Tyr His Thr Gly Ser Tyr Pro Arg Pro Gly Gin Val Asn Lys Trp 

115 120 125 

Ala Ala Glu Thr Thr Pro lie Lys Val Gly Asp Gly Leu Tyr Thr 

130 135 140 

Cys Ser Ala Met Asn Asp lie lie Lys Leu Asp Pro Ala Thr Gly 

145 150 155 

Lys Gin lie Trp Arg Arg Asn Val Asp Val Lys Tyr His Ser lie 
160 165 170 
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Pro Tyr Thr Ala Ala Cys Lys Gly Val Thr Tyr Phe Thr Ser Ser 

175 180 185 

Val Val Pro Glu Gly Gin Pro Cys His Asn Arg Leu He Glu Gly 

190 195 200 

Thr Leu Asp Met Arg Leu He Ala Val Asp Ala Glu Thr Gly Asp 

205 210 . 215 

Phe Cys Pro Asn Phe Gly His Gly Gly Gin Val Asn Leu Met Gin 

220 225 230 

Gly Leu Gly Glu Ser Val Pro Gly Phe Val Ser Met Thr Ala Pro 

235 240 245 

Pro Pro Val He Asn Gly Val Val Val Val Asn His Glu Val Leu 

25 0 255 260 

Asp Gly Gin Arg Arg Trp Ala Pro Ser Gly Val He Arg Gly Tyr 

265 270 275 

Asp Ala Glu Ser Gly Lys Phe Val Trp Ala Trp Asp Val Asn Asn 

280 285 290 

Ser Gly Arg Ser Gin Pro Ala Tyr Arg Val Thr Val He Thr Ala 

295 3.00 305 

Val Glu Arg Arg He Pro Gly Leu Pro Asp Arg Arg Gin Arg Gly 

310 315 320 

Gly Ser Arg Leu Arg Pro Asp Arg Asn Ser Ala Ala Asp Tyr Tyr 
325 330 335 
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Ser Ala Leu Arg Ser Asp Ala Glu Asn Lys Val Ser Ser Ala Val 

340 345 350 

Val Ala He Asp Val Lys Thr Gly Ser Pro Arg Trp Val Phe Gin 

355 360 365 

Thr Ala His Lys Asp Val Trp Asp Tyr Asp He Gly Ser Gin Ala 

370 375 3 8 o 

Thr Leu Met Asp Met Pro Gly Pro Asp Gly Gin Thr Val Pro Ala 

385 390 395 

Leu He Met Pro Thr Lys Arg Gly Gin Thr Phe Val Leu Asp Arg 

400 405 4 10 

Arg Thr Gly Lys Pro He Leu Pro Val Glu Glu Arg Pro Ala Pro 

415 420 425 

Ser Pro Gly Val He Pro Gly Asp Pro Arg Ser Pro Thr Gin Pro 

430 435 440 

Trp Ser Val Gly Met Pro Ala Leu Arg Val Pro Asp Leu Lys Glu 

D-Sorbitol dehydrogenase gene 445 

450 455 

Thr Asp Met Trp Gly Met Ser Pro He Asp Gin Leu Phe Cys Arg 
460 465 470 

He Lys Phe Arg Arg Ala Asn Tyr Val Gly Glu Phe Thr Pro Pro 
475 4 8 o 485 

Ser Val Asp Lys Pro Trp He Glu Tyr Pro Gly Tyr Asn Gly Gly 
490 495 500 
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Ser Asp Trp Gly Ser Met Ser Tyr Asp Pro Gin Ser Gly He Leu 

505 510 515 

He Ala Asn Trp Asn He Thr Pro Met Tyr Asp Gin Leu Val Thr 

520 525 53 0 

Arg Lys Lys Ala Asp Ser Leu Gly Leu Met Pro He Asp Asp Pro 

535 540 545 

Asn Phe Lys Pro Gly Gly Gly Gly Ala Glu Gly Asn Gly Ala Met 

550 555 5 6 o 

Asp Gly Thr Pro Tyr Gly He Val Val Thr Pro Phe Trp Asp Gin 

565 570 575 

Tyr Thr Gly Met Met Cys Asn Arg Pro Pro Tyr Gly Met He Thr 

580 585 590 

Ala He Asp Met Lys His Gly Gin Lys Val Leu Trp Gin His Pro 

595 600 60 5 

Leu Gly Thr Ala Arg Ala Asn Gly Pro Trp Gly Leu Pro Thr Gly 

610 615 6 20 

Leu Pro Trp Glu lie Gly Thr Pro Asn Asn Gly Gly Ser Val Val 

625 630 635 

Thr Gly Gly Gly Leu He Phe He Gly Ala Ala Thr Asp Asn Gin 

640 645 650 

He Arg Ala He Asp Glu His Thr Gly Lys Val val Trp Ser Ala 
655 660 665 
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Val Leu Pro Gly Gly Gly Gin Ala Asn Pro Met Thr Tyr Glu Ala 

670 675 680 

Asn Gly His Gin Tyr Val Ala lie Met Ala Gly Gly His His Phe 

685 690 695 

Met Met Thr Pro Val Ser Asp Gin Leu Val Val Tyr Ala Leu Pro 

700 705 710 

Asp Ala lie Lys Gin 

715 716 
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INFORMATION FOR SEQ ID NO:3: 



(i) 



(ii) 
(iii) 
(iv) 



(v) 



SEQUENCE CHARACTERISTICS: 



(A) 
(B) 
(C) 



LENGTH: 
TYPE: 

TOPOLOGY: 



MOLECULE TYPE: 
FRAGMENT TYPE: 
ORIGINAL SOURCE: 

ORGANISM: 

STRAIN: 
FEATURE: 

FEATURE KEY: 

POSITION: 



126 residues 
amino acid 
linear 
protein 

internal fragment 

Gluconobacter suboxydans 
WO 3255 

mat peptide 
1..126 



SEQUENCING METHOD: E 



Met Pro Asn Leu Gin Gly Asn Arg Thr Leu Thr Glu Trp Leu Thr 

1 5 .10 15 

Leu Leu Leu Gly Val lie Val Leu Leu Val Gly Leu Phe Phe Val 

20 25 30 

lie Gly Gly Ala Asp Leu Ala Met Leu Gly Gly Ser Thr Tyr Tyr 

35 40 45 

Val Leu Cys Gly lie Leu Leu Val Ala Ser Gly Val Phe Met Leu 

50 55 60 
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Met Gly Arg Thr Leu Gly Ala Phe Leu Tyr Leu Gly Ala Leu Ala 

65 70 75 

Tyr Thr Trp Val Trp Ser Phe Trp Glu Val Gly Phe Ser Pro lie 

80 85 90 

Asp Leu Leu Pro Arg Ala Phe Gly Pro Thr lie Leu Gly He Leu 

95 100 105 

Val Ala Leu Thr He Pro Val Leu Arg Arg Met Glu Ser Arg Arg 

110 115 120 

Thr Leu Arg Gly Ala Val 

125 126 
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so 



35 



40 



45 



SO 



INFORMATION FOR SEQ ID NO:4: 



(i) SEQUENCE CHARACTERISTICS: 



(ii) 

(iii) 

(iv) 



(v) 



(A) LENGTH: 

(B) TYPE: 

(C) TOPOLOGY: 



MOLECULE TYPE: 
FRAGMENT TYPE: 
ORIGINAL SOURCE: 

ORGANISM: 

STRAIN: 
FEATURE: 

SEQUENCING METHOD: E 



8 residues 
amino acid 
linear 
peptide 

internal fragment 



Gluconobacter suboxydans 
IFO 3255 



Lys Trp Ala Glu Glu Thr Xaa Pro 



55 
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10 



15 



20 



25 



30 



INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 



(ii) 

(iii) 

(iv) 



(v) 



(A) LENGTH: 

(B) TYPE: 

(C) TOPOLOGY: 



MOLECULE TYPE: 
FRAGMENT TYPE: 
ORIGINAL SOURCE: 

ORGANISM: 

STRAIN: 
FEATURE: 

SEQUENCING METHOD: E 



24 residues 
amino acid 
linear 
peptide 

internal fragment 



Gluconobacter suboxydans 
IFO 3255 



Lys Ser Gin Ser Tyr Phe Ala Gly Pro Ser Gin Tyr Ala Pro Gin 
1 5 10 15 

Ala Pro Gly Val Asn Ala Xaa Asn Leu 

20 24 



45 



SO 



55 
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INFORMATION FOR SEQ ID NO:6: 



(i) 



SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 

(B) TYPE: 

(C) TOPOLOGY: 



(ii) MOLECULE TYPE: 

(iii) FRAGMENT TYPE: 

(iv) ORIGINAL SOURCE: 

ORGANISM: 
STRAIN: 

(v) FEATURE: 

SEQUENCING METHOD: E 



16 residues 
amino acid 
linear 
peptide 

internal fragment 



Gluconobacter suboxydans 
IFO 3255 



Lys Val Leu Trp Gin His Pro heu Gly Thr Ala Arg Xaa Asn Gly 
1 . 5 10 15 

Pro 
16 
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10 



15 



20 



25 



30 



INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 



(iv) 



(v) 



(A) LENGTH: 

(B) TYPE: 

(C) TOPOLOGY: 



(ii) MOLECULE TYPE: 

(iii) FRAGMENT TYPE: 
ORIGINAL SOURCE: 

ORGANISM: 
STRAIN: 
FEATURE: 

SEQUENCING METHOD: E 



30 residues 
amino acid 
linear 
peptide 

N-terminal fragment 



Gluconobacter suboxydans 
IFO 3255 



35 



45 



50 



Gin Phe Ala Pro Ala Gly Ala Gly Gly Glu Pro Ser Ser Ser Val 
1 5 10 15 

Pro Gly Pro Gly Asn Ala Ser Glu Pro Thr Glu Asn Ser Pro Lys 

20 25 30 



55 
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INFORMATION FOR SEQ ID NO:8: 



is 



20 



(i) 



(ii) 
(iii) 



(iv) 



SEQUENCE CHARACTERISTICS: 



(A) LENGTH: 

(B) TYPE: 

(C) TOPOLOGY: 
MOLECULE TYPE: 
ORIGINAL SOURCE: 

ORGANISM: 
STRAIN: 
FEATURE: 

FEATURE KEY: 
POSITION: 



24 residues 
amino acid 
linear 
peptide 

Gluconobacter suboxydans 
IFO 3255 

sig peptide 
1..24 



SEQUENCING METHOD: E 



35 



Met Arg Arg Pro Tyr Leu Leu Ala Thr Ala Ala Gly Leu Ala Leu 
15 10 15 

45 

Ala Cys Ser Pro Leu lie Ala His Ala 

20 24 

so 
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INFORMATION FOR SEQ ID NO:9: 



(i) 



SEQUENCE CHARACTERISTICS: 



(ii) 

(iii) 

(iv) 



(v) 



(A) LENGTH: 

(B) TYPE: 

(C) TOPOLOGY: 



MOLECULE TYPE: 
FRAGMENT TYPE: 
ORIGINAL SOURCE: 

ORGANISM: 

STRAIN: 
FEATURE: 

SEQUENCING METHOD: E 



6 residues 
amino acid 
linear 
peptide 

C-terminal fragment 

Gluconobacter suboxydans 
TFO 3255 



Pro Asp Ala lie Lys Gin 
1 5 6 



Claims 

1 . A DNA comprising a nucleotide sequence which encodes a protein as defined by (a) or (b)and having sorbitol dehy- 
drogenase activity: 

(a) a protein having the amino acid sequence from the position 1 to 716 of the sequence described in SEQ ID 
NO: 2, or 



(b) a protein derived from the protein of (a) by substitution, deletion, insertion or addition of one or more amino 
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acids in the amino acid sequence defined in (a). 

2. A DNA according to claim 1 , which encodes sorbitol dehydrogenase of a microorganism belonging to acetic acid 
bacteria. 

5 

3. A DNA according to claim 2, wherein the said microorganism belongs to the genus Gluconobacter or the genus 
Acetobacter. 

4. A DNA according to any one of claims 1 -4, which is selected from the group consisting of: 

10 

(e) a DNA comprising a nucleotide sequence from the position 644 to 2791 or from the position 572 to 2791 of 
the sequence represented by SEQ ID NO: 1, and 

(f) a DNA which is capable of hybridizing to the DNA defined in (e) and encodes the protein having the function 
is of the protein defined in (a) of claim 1 . 

5. A combination of DNAs comprising; 

(1) a DNA as defined in any one of claims 1-4; and 

20 

(2) a DNA enooding ORF2 having a nucleotide sequence from the position 192 to 569 of the sequence repre- 
sented by SEQ ID NO: 1 and/or a DNA encoding an ORF2 derivative with substitution, deletion, insertion or 
addition of one or more amino acids of the sequence defined in SEQ ID NO: 3 and with the function equivalent 
to the one of ORF2. 

25 

6. A combination of DNAs according to claim 5 comprising; 

(1) a DNA having a nucleotide sequence from the position 644 to 2791 or from the position 572 to 2791 of the 
sequence represented by SEQ ID NO: 1 ; or a DNA which is capable of hybridizing to this DNA which encodes 

30 the protein having the function of the protein defined claim 1 ; and 

(2) a DNA encoding ORF2 having a nucleotide sequence from the position 192 to 569 of the sequence repre- 
sented by SEQ ID NO: 1 or a DNA encoding an ORF2 derivative with substitution, deletion, insertion or addi- 
tion of one or more amino acids of the sequence defined in SEQ ID NO: 3 with the function equivalent to one 

35 of ORF2. 

7. An expression vector comprising a DNA as claimed in any one of claims 1-6. 

8. A combination of DNAs according to claims 5 or 6, wherein the respective DNAs are separately carried on different 
40 expression vectors or on the same expression vector. 

9. A combination of DNAs according to claims 5 or 6, which are in a tandem form as described in SEQ ID NO: 1 . 

10. The expression vector of claim 7, which is functional in a microorganism selected from those belonging to the 
45 genus Gluconobacter, Acetobacter or E. coli.. 

1 1 . A recombinant organism which has been transformed by an expression vector as claimed in any one of claims 7 to 
10. 

so 12. A recombinant organism having the DNA as claimed in any one of claims 1 - 4 or the combination of the DNAs as 
defined in any one of claims 5, 6, 8 or 9 on a chromosomal DNA of the host organism. 

13. The recombinant organism as claimed in claim 11 or 12, wherein the host organism is a microorganism selected 
from those belonging to the genus Gluconobacter, Acetobacter or E. co//.. 

55 

14. A process for producing recombinant D-sorbitol dehydrogenase which comprises cultivating the recombinant 
organism as claimed in any one of claims 11 - 13 in an appropriate medium and recovering the said recombinant 
D-sorbitol dehydrogenase from the culture. 
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15. A recombinant D-sorbitol dehydrogenase produced by the expression of the DNA as claimed in any one of claims 
1 * 4 or a combination of DNAs as claimed in any one of claims 5, 6, 8 or 9. 

16. A recombinant D-sorbitol dehydrogenase produced by the process claimed in claim 14. 

17. A recombinant D-sorbitol dehydrogenase according to claim 15 or 16, which is immobilized on a solid carrier for 
solid phase enzymatic reaction. 

18. A process for producing L-sorbose which comprises converting D-sorbitol into L-sorbose with the aid of the recom- 
binant D-sorbitol dehydrogenase claimed in claim 16 or 17.. 

19. A process for producing L-sorbose which comprises converting D-sorbitol into L-sorbose by fermentation of the 
recombinant organism claimed in any one of claims 11 - 13 in an appropriate medium. 
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10 20 30 40 50 60 

ACAAATCATACTGGCGGCGCTGTAGTGACAATTCCGGCGGGTTAAAGAGAATATTTTTTT 

70 80 90 100 110 120 

GGTGACAGGCCACAACAAATTTTTGTTACCTCAAACACAGTTTTGTTAGAGCATTTGA^ 



130 



140 



150 



160 



SD for ORF 2 gene 

170 180 



ACGAAGTCCGATGGACCTGAACTGAATATGGATTTACCGTCCGGAGGATTCAGTT3 GGGA 



190 ORF 2| 200 



, 210 220 230 240 

GGfTATTCGGTl ATGCCAAATCTTCAAGGTAATAGGACTCTGACGGAGTGGCTGACGCTGC 
NLQGMRTLTEWLTLL 



M 



250 260 270 280 290 300 

TTCTCGGGGTCATCGTCCTTCTTGTGGGCCTGTTCTTCGTCATTGGGGGTGCTGACCTCG 
LGVIVLLVGLFFVIGGADLA 

310 320 330 340 350 360 

CGATGCTGGGCGGCTCTACCTACTATGTTCTCTGTGGCATCCTCCTGGTTGCTAGCGGCG 
MLGGSTYYVLCG ILLVASGV 

370 380 390 400 410 420 

TATTCATGCTCATGGGCCGCACGCTTGGTGCCTTCCTGTATCTGGGTGCCCTGGCCTACA 
FMLMGRTLGAFLYLGALAYT 

. 430 .440 450 460 470 480 

CGTGGGTCTGGTCCTTCTGGGAAGTCGGTTTCAGCC^ 

WVWSFWEVGFSPIDLLPRAF 

490 500 510 520 530 540 

TCGGCCCGACCATCCTTGGCATTCTCGTTGCCCTGACCATTCCGGTCCTGCGCCGCATGG 
GPTILGILV ALT IPVLRRME 
SD for SXJ3H gene 



550 



560 



573 



R R T L R 



SLDH ORF 



AAAGCCGTCGTACTCTC jAGAGGAGfr CGTC IX ATGCGC C GGCCTTACCTTCTAGC AACAGC 



590 



600 



M 



610 620 630 640 650 660 

CGCAGGACTCGCCCTTGCCTGTTCGCCGCTCATCGCTCATGCACAGTTTGCTCCCGCAGG 
AGLALACS PLIAHA Q F A P A G 

i N- terminus 

700 710 720 



Signal sequence (24 A. A.) 

670 680 690 

GGCTGGCGGCGAACCTTCCTCGTCAGTTCCTGGGCCAGGAAATGCGAGCGAGCCCACCGA 
AGGEPSSSVPGPGNASEPTE 



730 740 750 760 770 780 

AAACTCTCCGAAAAGTCAGAGCTACTTCGCAGGACCGTCGCCCTATGCCCCGCAGGCTCC 

N S P Y-«-^--A-iL-J--_ S --- P - Y_ A P Q A P 

Peptide No, 3 " ----- ------ 

790 800 810 820 830 840 

TGGCGTAAACGCAGCCAACCTGCCGGACATTGAGTCAATCGATCCCTCGCAGGTCCCGGC 
G V _N__ A__ A__ N LPDIESIDPSQVPA 

850 860 870 880 890 900 

CATGGCTCCGCAGCAGAGTGCCAATCCGGCACGTGGAGACTGGGTTGCTTACGGACGTGA 
MAPQQSANPARGDWVAYGRD 



Fig. 3a 
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910 920 930 940 950 960 

CGATCATCAGACGCGATACTCTCCGCTTTCGGAAATCACGCCTGAGAACGCAAGCAAGCT 
DHQTRYSPLSEITPENASKL 

970 980 990 1000 1010 1020 

CAAGGTCGCTTTCGTCTACCACACGGGGAGTTATCCGCGTCCGGGACAGGTGAACAAATG 
KVAFVYHTG S Y PRPG QVNKW 

Peptide No" 1 

1030 1040 1050 1060 1070 1080 

GGCCGCCGAAACCACGCCGATCAAGGTTGGTGACGGTCTCTACACATGTTCCGCCATGAA 
AAETTPIKVGDGLYTCSAMN 

1090 1100 1110 1120 1130 1140 

CGACATCATCAAGCTGGATCCGGCTACGGGTAAGCAGATCTGGCGTCGGAACGTGGATGT 
DI IKLDPATGKQIWRRNVDV 

1150 1160 1170 1180 1190 1200 

CAAATACCACTCCATTCCCTATACCGCTGCCTGTAAGGGTGTGACGTATTTCACGTCCTC 
KYHSI PYTAACKGVTYFTSS 

1210 1220 1230 1240 1250 1260 

CGTGGTGCCGGAAGGCCAGCCCTGCCACAATCGCCTTATCGAAGGCACGCTGGATATGCG 
VVPEGQPCHNRLI EGTLDMR 

1270 1280 1290 1300 1310 1320 

TCTGATTGCGGTTGACGCGGAGACAGGGGATTTCTGCCCTAATTTCGGTCATC 
LIAVD AETGDFCPNFGH GGQ 

1330 1340 1350 1360 1370 1380 

GGTCAACCTGATGCAGGGTCTGGGTGAGTCTGTTCCGGGCTTCGTCTCCATGACGGCACC 
VNLMQGLGESVPGFVSMTAP 

1390 1400 1410 1420 1430 1440 

TCCACCGGTCATCAACGGCGTCGTGGTTGTAAACCACGAAGTGCTCGACGGTCAGCGCCG 
P PVINGVVVVNHEVLDGQRR 

1450 1460 1470 1480 1490 1500 

CTGGGCTCCGTCCGGTGTGATCCGTGGTTACGATGCTGAAAGTGGCAAATTCGTATGGGC 
W APSGVI RGYDAESGKFVWA 

1510 1520 1530 1540 1550 1560 

CTGGGACGTCAACAATTCCGGACGATCACAGCCAGCCTACCGGGTAACCGTCATTACAGC 
WDVNNSGRSQPAYRVTVITA 

1570 1580 1590 1600 1610 1620 

CGTGGAACGCCGAATTCCTGGGCTACCTGACAGGCGACAACGAGGAGGGTCTCGTTTACG 
VERRI PGLPDRRQRGGSRLR 

1630 1640 1650 1660 1670 1680 

TCCCGACAGGAACTCTGCTGCTGACTATTACAGCGCCCTGCGTAGTGATGCTGAAAACAA 
PDRNSAADYYSALRSDAENK 

1690 1700 1710 1720 1730 1740 

GGTGTCCTCCGCTGTTGTCGCCATTGACGTCAAGACGGGTTC 
VSSAVVAIDVKTGSPRWVFQ 

1750 1760 1770 1780 1790 1800 

GACGGCTCATAAGGACGTCTGGGATTATGACATCGGTTCACAGGCGACCCTGATGGATAT 
TAHKDVWDYDIGSQATLMDM 

Fig. 3b 
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1810 1820 1830 1840 1850 i860 

GCCTGGCCCGGATGGCCAGACGGTTCCTGCTCTCATCATGCCGACCAAGCGTGGCCAGAC 
PG PDGQTVPAL IMPTKRGQT 

1870 1880 1890 1900 1910 1920 

GTTCGTGCTTGACCGTCGTACCGGCAAGCCAATTCTGCCGGTTGAAGAACGCCCAGCTCC 
FVLDRRTGKPILPVEERPAP 

1930 1940 1950 1960 1970 1980 

GTCCCCTGGTGTTATTCCGGGTGACCCGCGTTCTCCGACGCAGCCATGGTCCGTCGGGAT 
SPGVIPGDPRSPTQPWSVGM 

1990 2000 2010 2020 2030 2040 ' 

GCCGGCCCTTCGCGTGCCGGATCTGAAAGAGACAGACATGTGGGGTATGTCCCCCATCGA 
PALRVPDLKETDMWGMSPID 

2050 2060 2070 2080 2090 2100 

TCAGCTCTTCTGCCGTATCAAGTTCCGCCGTGCGAACTATGTGGGTGAGTTCACACCACC 
QLFCRIKFRRANYVGEFTPP 

2110 2120 2130 2140 2150 2160 

GAGCGTTGACAAGCCGTGGATTGAATATCCGGGCTATAACGGTGGCAGTGACTGGGGCTC 
SVDKPWI EYPGYNGGSDWGS 

2170 2180 2190 2200 2210 2220 

CATGTCCTATGATCCGCAGTCCGGCATCCTGATTGCGAACTGGAACATCACACCGATGTA 
MSYDPQSGILIANWNITPMY 

2230 2240 2250 2260 2270 2280 

CG AC CAGCTCGTAACCCGCAAG AAGGC AGACTCCCTCGGCCTG ATGCCG ATCG ATG ACC C 
DQLVTRKKADSLGLMPIDDP 

2290 2300 2310 2320 2330 2340 

C AACTTC AAGCCAGGTGGCGGTG GTGCCGAAGGTAAC GGCGCC ATGG ACGG AAC GCCTTA 
NFKPGGGGAEGNGAMDGTPY 

2350 2360 2370 2380 2390 2400 

CGGTATCGTCGTGACACCGTTCIGGGATCAGTACACGGGCATGATGTGCAACCGTCCGCC 
G IVVT PFWDQYTGMMCNRPP 

2410 2420 2430 2440 2450 2460 

YGMITAIDMKHGQ V__ J j ..Jt.JJ.. - H„_P. 



Peptide No. 8 

2470 2480 2490 2500 2510 2520 

GCTCGGAACGGCTCGCGCCAACGGTCCATGGGGTCTGCCAACAGGTCTGCCATGGGAAAT 
LGTARANGPWGL PTGLPWEI 

2530 2540 2550 2560 2570 2580 

CGGCACTCCGAACAATGGTGGTTCGGTTGTGACCGGTGGCGGTCTGATCTTCATCGGTGC 
GTPNNGGSVVTGGGL I FIGA 

2590 2600 2610 2620 2630 2640 

GGCAACGGATAACCAGATCCGCGCGATTGATGAACACACTGGCAAGGTTGTCTGGAGCGC 
ATDNQ IRAI DEHTGKVVWSA 



2650 2660 2670 2680 2690 2700 

AGTCCTCCCCGGCGGCGGTCAGGCCAATCCGATGACGTATGAAGCCAATGGTCACCAGTA 
VLPGGGQANPMTY EANGHQY 



Fig. 3c 
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2710 2720 2730 2740 2750 2760 

CGTTGCCATCATGGCTGGCGGTCATCACTTCATGATGACGCCAGTGTCTGACCAGCTTGT 
VAIMAGGHHFMMTPVSDQLV 



2770 2780 2790 

KTGCACTGCCGGATGCCATCAAGC 
V Y A L P D A I K Q 



GGTTTACGCACTGCCGGATGCCATCAAGCAGrAATTAAGTCCTGTGGCGGATGTGTCATG 



2800 



2810 



2820 



IR2 



2830 2840 2850 2360 2870 2880 

CATATCCGCCACA CTC CATCGTCAGAAGGAGACTTTCGTGCTAGCCATGCAGGGAAGTCT 

2890 2900 2910 2920 IR2 '2930 2940 

CC l TTTGACG U ' lU ' ri ^GCTCTTTCCAGCGAGCGGGCAGTCTGAAACGGGGC l ' iXJ GTCTGG 

2950 2960 2970 2980 2990 3000 

CTCGTACTTTCAGAATGGCTCGTCGCACCCTCATGACTGCCCACTCCCCCGTTATCTTGC 

3010 3020 3030 3040 3050 3060 

AGGTTCTGCCAGCCCTCAGCACGGGCGGCCTGGAGCGGGGAGCTATTGAAATTGCGGCTG 

3070 3080 3090 3100 3110 3120 

CCATCACACAGGCTGGTGGCAAGGCCATTGTCGCTTCGAAGACGGGTCC 

3130 3140 3150 3160 3170 3180 

AACTCCGCCACGTCGGAGCAGTGCATGTGCCGCTGGATCTCAAATCGAAATCGCCGTTTT 

3190 3200 3210 3220 3230 3240 

CTGTTCGGCGCCGTGCCCGTGAACTCCAGAAACTGATCCGGGAGCAGCAGGTTCATCTGG 

3250 3260 3270 3280 3290 3300 

TTCACGCCCGGTCCCGTATTCCGGCATGGGCCGCCTGGCTCGCCTGCCGCCGCGAGAACA 

3310 3320 3330 3340 3350 3360 

TTCCTTTCGTGACAACGTGGCATGGCGTCCACGAGGCTGGCTGGTGGGGCAAGAAATTCT 

3370 3380 3390 3400 3410 3420 

ACAATTCGGTGCTGGCCCGGGGTGCAAGGGTCATCGCAATTT^ 

3430 3440 3450 3460 3470 3480 

GTCTTTCAGGGCAGTACGGCGTTCAGGCAGATCGTCTTCGAACCATTCCGCGTGGTGCCG 



3481 
A 



IR1 ud IK2 (or 2 * ) axe inverted 
repeats as possible transcription 
terminators for ORF2 and SLDH ZX genes 
respectively . 



Fig. 3d 
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